CUDA & CPU: support F32 kernel type for `CONV_TRANSPOSE_2D` #17094

AgainstEntropy · 2025-11-08T02:21:47Z

also updated test case in test-backend-ops.
But since F32 kernel type is not supported on CPU, only GGML_TYPE_F16 is kept and GGML_TYPE_F32 can be uncommented back in the future.

…types and improve parameter handling - Introduced a `conv2d_transpose_params` struct for better parameter management. - Updated `conv2d_transpose_kernel` to be templated for different kernel types (float and half). - Modified `ggml_cuda_conv_2d_transpose_p0` to handle both F16 and F32 kernel types. - Enhanced test cases to validate functionality for both kernel types.

…ernel types - Updated `test_conv_transpose_2d` structure to improve parameter handling by reordering constructor arguments. - Enhanced test case generation to iterate over kernel types, allowing for flexible testing of different configurations. - Removed hardcoded kernel type instances in favor of a loop for better maintainability and scalability.

am17an

Does this PR make a difference to something? From what I understand, the kernel value is upcast into float before doing any accumulation (and accumulation is anyway in f32). So unless there are kernels around which don't fit into f16 I don't see a benefit to supporting this, especially when we don't support the f16 inputs yet (which incidentally might be more relevant than kernels being f32 as we could potentially do half2 multiplications)

ggml/src/ggml-cuda/conv2d-transpose.cu

AgainstEntropy · 2025-11-12T18:20:16Z

Does this PR make a difference to something? From what I understand, the kernel value is upcast into float before doing any accumulation (and accumulation is anyway in f32). So unless there are kernels around which don't fit into f16 I don't see a benefit to supporting this, especially when we don't support the f16 inputs yet (which incidentally might be more relevant than kernels being f32 as we could potentially do half2 multiplications)

So the motivations of this PR are:

Currently in ggml_backend_cuda_device_supports_op it always returns true for GGML_OP_CONV_TRANSPOSE_2D without checking the kernel type, thus may cause crashes when actually computing. This PR fixs this mismatching behavior.

llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu

Lines 4061 to 4064 in 8e878f0

    
           case GGML_OP_CONV_TRANSPOSE_2D: 
        
           case GGML_OP_POOL_2D: 
        
           case GGML_OP_ACC: 
        
               return true;

Some recent models are natively BF16, and using F16 kernel can lead to overflows. F32 is safe here and can be readily used for precision verification.

am17an · 2025-11-13T00:24:21Z

So the motivations of this PR are:

Currently in ggml_backend_cuda_device_supports_op it always returns true for GGML_OP_CONV_TRANSPOSE_2D without checking the kernel type, thus may cause crashes when actually computing. This PR fixs this mismatching behavior.

That's because it matches the CPU capabilities exactly

Some recent models are natively BF16, and using F16 kernel can lead to overflows. F32 is safe here and can be readily used for precision verification.

That would be a problem in a conversion to GGUF, not necessarily a problem to be solved here.

am17an · 2025-11-13T00:30:29Z

You should add the CPU version for the f32 kernel too, that way this PR makes more sense

…nd F32 tensor types.

…nhancing flexibility for different data types. Update test cases to include both F16 and F32 tensor types for comprehensive coverage.

ggml/src/ggml-cpu/ops.cpp

tests/test-backend-ops.cpp

ggml/src/ggml-cuda/conv2d-transpose.cu

ggml/src/ggml-cpu/ggml-cpu.c

Co-authored-by: Aman Gupta <[email protected]>

…pose_params struct and dispatching with direct kernel launch.

…d kernel type for improved flexibility with F16 and F32 data types.

AgainstEntropy · 2025-11-14T21:36:18Z

Hi @am17an, thanks for reviewing this PR.

Here’s what has been updated:

Simplified CUDA kernel dispatch logic.
Renamed type_kernel to kernel_type.
Introduced a templated ggml_compute_forward_conv_2d_transpose_impl to reduce duplication.

Please let me know if there’s anything else you’d like changed.

am17an

I'm okay with the CUDA changes. Either @ggerganov or @slaren has to approve the ggml-cpu changes.

AgainstEntropy added 2 commits November 6, 2025 20:47

AgainstEntropy requested a review from slaren as a code owner November 8, 2025 02:21

github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 8, 2025

DajanaV mentioned this pull request Nov 8, 2025

UPSTREAM PR #17094: CUDA: support F32 kernel type for CONV_TRANSPOSE_2D auroralabs-loci/llama.cpp#129

Closed

am17an reviewed Nov 12, 2025

View reviewed changes

ggml/src/ggml-cuda/conv2d-transpose.cu Outdated Show resolved Hide resolved

ggml/src/ggml-cuda/conv2d-transpose.cu Outdated Show resolved Hide resolved

AgainstEntropy added 3 commits November 13, 2025 19:16

Merge remote-tracking branch 'origin/master' into conv2d-transpose

2af0f22

Refactor ggml_compute_forward_conv_transpose_2d to support both F16 a…

99da64a

…nd F32 tensor types.

Refactor conv2d transpose kernel to use a template for kernel type, e…

ecb82da

…nhancing flexibility for different data types. Update test cases to include both F16 and F32 tensor types for comprehensive coverage.

AgainstEntropy requested a review from ggerganov as a code owner November 14, 2025 03:50

AgainstEntropy requested a review from am17an November 14, 2025 03:51

AgainstEntropy changed the title ~~CUDA: support F32 kernel type for CONV_TRANSPOSE_2D~~ CUDA & CPU: support F32 kernel type for CONV_TRANSPOSE_2D Nov 14, 2025

am17an reviewed Nov 14, 2025

View reviewed changes

AgainstEntropy and others added 2 commits November 14, 2025 10:42

Update ggml/src/ggml-cuda/conv2d-transpose.cu

0ca9e53

Co-authored-by: Aman Gupta <[email protected]>

Update ggml/src/ggml-cpu/ggml-cpu.c

175708c

Co-authored-by: Aman Gupta <[email protected]>

AgainstEntropy force-pushed the conv2d-transpose branch from 881538e to 175708c Compare November 14, 2025 18:45

Refactor conv2d transpose implementation by removing the conv2d_trans…

883e261

…pose_params struct and dispatching with direct kernel launch.

DajanaV mentioned this pull request Nov 14, 2025

UPSTREAM PR #17094: CUDA & CPU: support F32 kernel type for CONV_TRANSPOSE_2D auroralabs-loci/llama.cpp#211

Open

Enhance cpu conv2d transpose implementation by introducing a template…

7a64503

…d kernel type for improved flexibility with F16 and F32 data types.

am17an approved these changes Nov 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA & CPU: support F32 kernel type for `CONV_TRANSPOSE_2D` #17094

CUDA & CPU: support F32 kernel type for `CONV_TRANSPOSE_2D` #17094

AgainstEntropy commented Nov 8, 2025

Uh oh!

am17an left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

AgainstEntropy commented Nov 12, 2025 •

edited

Loading

Uh oh!

am17an commented Nov 13, 2025

Uh oh!

am17an commented Nov 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AgainstEntropy commented Nov 14, 2025

Uh oh!

am17an left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CUDA & CPU: support F32 kernel type for CONV_TRANSPOSE_2D #17094

Are you sure you want to change the base?

CUDA & CPU: support F32 kernel type for CONV_TRANSPOSE_2D #17094

Conversation

AgainstEntropy commented Nov 8, 2025

Uh oh!

am17an left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AgainstEntropy commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

am17an commented Nov 13, 2025

Uh oh!

am17an commented Nov 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AgainstEntropy commented Nov 14, 2025

Uh oh!

am17an left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CUDA & CPU: support F32 kernel type for `CONV_TRANSPOSE_2D` #17094

CUDA & CPU: support F32 kernel type for `CONV_TRANSPOSE_2D` #17094

am17an left a comment •

edited

Loading

AgainstEntropy commented Nov 12, 2025 •

edited

Loading